{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a8d3ff0b",
   "metadata": {},
   "source": [
    "# Document Translation Practice Based on ERNIE 4.5 and PaddleOCR\n",
    "\n",
    "## 1. Background Overview\n",
    "\n",
    "Against the backdrop of globalization, the demand for cross-lingual communication is increasing, highlighting the growing importance of translation tasks. Especially with the acceleration of digitalization, the need to translate document images is on the rise. However, document image translation faces unique challenges.\n",
    "\n",
    "First, document images often contain complex layout structures, including elements such as text, charts, and tables. This complexity poses significant difficulties for layout analysis. Traditional OCR technology often struggles to accurately extract text and preserve the original formatting when dealing with complex layouts. Therefore, layout analysis techniques are needed to extract the content from document images. Document image analysis is a technology for extracting structured information from document images, mainly used to convert complex document layouts into machine-readable data formats. This technology has broad applications in document management, information extraction, and data digitization. By combining optical character recognition (OCR), image processing, and machine learning algorithms, document analysis can identify and extract text blocks, headings, paragraphs, images, tables, and other layout elements from documents. It can generate structured document data, improving the efficiency and accuracy of data processing. PP-StructureV3, a leading document analysis tool developed by PaddlePaddle, has especially enhanced abilities in layout region detection, table recognition, and formula recognition. It also adds capabilities for chart understanding, restoring multi-column reading order, and converting results to Markdown files. It performs excellently on various types of document data and can handle relatively complex documents.\n",
    "\n",
    "In addition, multilingual translation itself presents challenges. There are significant differences in grammar, vocabulary, and cultural backgrounds between different languages. When dealing with long sentences and context-dependent translation tasks, traditional translation tools often struggle to deliver high-quality results. For document image translation tools, the challenge is how to accurately analyze the layout while providing smooth and faithful multilingual translation. This requires leveraging the capabilities of large language models. Based on this, this tutorial provides a practical guide for document translation using PaddleOCR and ERNIE 4.5.\n",
    "\n",
    "The workflow is shown in the diagram below: First, use PP-StructureV3 to analyze the content of the document image and obtain a structured data representation. Then, process this into a Markdown-formatted document file. Finally, use prompt engineering to construct prompts and call ERNIE 4.5 to translate the document content. This approach can not only accurately recognize and analyze complex document layouts but also achieve high-quality multilingual translation services, meeting users’ document translation needs in different language environments.\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/doc_translation/pp_doctranslation.png\" width=\"800\"/>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5e71e90",
   "metadata": {},
   "source": [
    "## 2. Environment Preparation\n",
    "\n",
    "### 2.1 Install the PaddlePaddle Framework\n",
    "\n",
    "In this example, multiple Paddle deep learning models will be used to accomplish document content recognition and translation. Therefore, you need to install the PaddlePaddle framework first. Please refer to the [installation guide](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html) to complete the installation. Here is an example command:\n",
    "\n",
    "`python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/`\n",
    "\n",
    "### 2.2 Deploy ERNIE 4.5 and Configure Key Parameters\n",
    "\n",
    "In this example, the ERNIE large language model is accessed via service requests, so it needs to be deployed as a local service. You can deploy the ERNIE model using the FastDeploy tool. FastDeploy is an open-source large model inference and deployment tool developed by PaddlePaddle. Please refer to the [FastDeploy official documentation](https://github.com/PaddlePaddle/FastDeploy) for deployment instructions.\n",
    "\n",
    "Additionally, to test the availability of the translation model, you need to install the OpenAI SDK library before deploying the FastDeploy service. Execute the following command to install:\n",
    "\n",
    "`pip install openai`\n",
    "\n",
    "After deploying FastDeploy as a background service, you need to fill in the service URL in the configuration below, and use the script to test the service. If the output contains \"Test successful!\", it means the service is available. Otherwise, the service is unavailable and you should troubleshoot according to the error message."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b06372e",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d443d98",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Please fill in the URL of the local service below, e.g., http://0.0.0.0:8000/v1\n",
    "ERNIE_URL = \"\"\n",
    "\n",
    "try:\n",
    "    import openai\n",
    "\n",
    "    client = openai.OpenAI(base_url=ERNIE_URL, api_key=\"api_key\")\n",
    "    question = \"Who are you?\"\n",
    "    response1 = client.chat.completions.create(\n",
    "        model=\"xxx\", messages=[{\"role\": \"user\", \"content\": question}]\n",
    "    )\n",
    "    reply = response1.choices[0].message.content\n",
    "except Exception as e:\n",
    "    print(f\"Test failed! The error message is:\\n{e}\")\n",
    "\n",
    "print(f\"Test succeeded!\\nThe question is: {question}\\nThe answer is: {reply}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff695111",
   "metadata": {},
   "source": [
    "### 2.3 Installing PaddleOCR\n",
    "\n",
    "The PP-DocTranslation document translation toolkit used in this example is integrated within PaddleOCR, so PaddleOCR needs to be installed.\n",
    "\n",
    "PaddleOCR is a leading open-source document image analysis tool released by PaddlePaddle. Since its launch, it has become popular among academia, industry, and research communities due to its cutting-edge algorithms and practical industry applications. It has been widely adopted by many well-known open-source projects, such as Umi-OCR, OmniParser, MinerU, RAGFlow, etc., and has become the first choice for developers in the open-source OCR field. PaddleOCR not only integrates a large number of excellent algorithms and models for OCR and layout analysis, but also provides production-ready pipelines such as PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4. By integrating multiple expert models, it can provide end-to-end solutions for specific problems.\n",
    "\n",
    "PaddleOCR offers a precompiled Python package that can be installed with a single command as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26dab7b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install paddleocr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b6d9fd2",
   "metadata": {},
   "source": [
    "## 3. Quick Experience\n",
    "\n",
    "First, perform document parsing processing on the document image to obtain the structured information of the document image. Please fill in the path of the document image to be translated, the save path of the prediction results, and the target language for translation in the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81055105",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fill in the path of the document image to be predicted, which supports image files, PDF files, local file paths, and network URLs\n",
    "input_path = \"\"\n",
    "\n",
    "# Fill in the save path for the prediction results:\n",
    "output_path = \"./output/\"\n",
    "\n",
    "# Target language for translation\n",
    "# Supports language codes defined by ISO 639-1\n",
    "# For example, \"en\" for English, \"ja\" for Japanese, and \"fr\" for French\n",
    "target_language = \"en\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "882640e9",
   "metadata": {},
   "source": [
    "After completing the above configurations, you can instantiate the PP-DocTranslation production line of PaddleOCR with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69d2ad54",
   "metadata": {},
   "outputs": [],
   "source": [
    "from paddleocr import PPDocTranslation\n",
    "\n",
    "translation_engine = PPDocTranslation(\n",
    "    use_doc_orientation_classify=False, use_doc_unwarping=False, use_seal_recognition=True, use_table_recognition=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee5f1c0e",
   "metadata": {},
   "source": [
    "Attention should be paid to the settings of prediction parameters, which should be configured according to the actual situation:\n",
    "* `use_doc_orientation_classify`: Whether to use the document image orientation classification model;\n",
    "* `use_doc_unwarping`: Whether to use the document image distortion correction model;\n",
    "* `use_seal_recognition`: Whether to use the seal recognition model;\n",
    "* `use_table_recognition`: Whether to use the table recognition model;\n",
    "\n",
    "For more parameter descriptions, please refer to the parameter [documentation](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0/docs/pipeline_usage/tutorials/ocr_pipelines/PP-DocTranslation.md).\n",
    "\n",
    "After instantiating the pipeline, you can call the `visual_predict()` method of the pipeline to parse the document image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1cc9a51",
   "metadata": {},
   "outputs": [],
   "source": [
    "visual_predict_res = translation_engine.visual_predict(input_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "698c72f7",
   "metadata": {},
   "source": [
    "Since the input document image is a multi-page PDF file, the return value of the `visual_predict()` method is a generator, and each call completes the parsing and prediction of one page of the image. Therefore, it is necessary to complete the parsing of all pages of the document through loop calls, concatenate the results, and finally obtain the prediction results of the entire image, which are then saved to the specified path. The relevant code is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0de81c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "ori_md_info_list = []\n",
    "for res in visual_predict_res:\n",
    "    layout_parsing_result = res[\"layout_parsing_result\"]\n",
    "    ori_md_info_list.append(layout_parsing_result.markdown)\n",
    "    layout_parsing_result.save_to_img(output_path)\n",
    "    layout_parsing_result.save_to_markdown(output_path)\n",
    "\n",
    "if input_path.lower().endswith(\".pdf\"):\n",
    "    ori_md_info = translation_engine.concatenate_markdown_pages(ori_md_info_list)\n",
    "    ori_md_info.save_to_markdown(output_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9332bf24",
   "metadata": {},
   "source": [
    "After obtaining the parsing results of the entire document image, you can invoke a large model to complete content translation and save the translation results to a specified path. The relevant code is as follows. Before running, you also need to fill in the ERNIE large model service URL in the code below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4923014",
   "metadata": {},
   "outputs": [],
   "source": [
    "chat_bot_config = {\n",
    "    \"module_name\": \"chat_bot\",\n",
    "    \"model_name\": \"xxx\",\n",
    "    # Please fill in the URL of the ERNIE Large Model Serving below, e.g., http://0.0.0.0:8000/v1\n",
    "    \"base_url\": \"Please fill in the URL of the local Serving\",\n",
    "    \"api_type\":\"openai\",\n",
    "    \"api_key\": \"api_key\"\n",
    "}\n",
    "\n",
    "tgt_md_info_list = translation_engine.translate(\n",
    "    ori_md_info_list=ori_md_info_list,\n",
    "    target_language=target_language,\n",
    "    chunk_size=3000,\n",
    "    chat_bot_config=chat_bot_config,\n",
    ")\n",
    "for tgt_md_info in tgt_md_info_list:\n",
    "    tgt_md_info.save_to_markdown(output_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a944d25b",
   "metadata": {},
   "source": [
    "The example translation results are as follows (the left side is the original English PDF paper image, and the right side is the Markdown file translated into Chinese)\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/doc_translation/PP-DocTranslation_demo.jpg\" width=\"800\"/>\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c96332e",
   "metadata": {},
   "source": [
    "## 4. Summary\n",
    "\n",
    "This tutorial first introduced the background of document translation as well as the current challenges and issues faced in the field. Traditional OCR technology struggles with accurately extracting text and parsing complex layout structures, while differences in grammatical structure, vocabulary usage, and cultural background further increase the difficulty of high-quality multilingual document translation. This tutorial explained how to leverage the document image analysis capabilities of PP-StructureV3 and the translation capabilities of ERNIE 4.5 to achieve a high-quality document translation solution. Finally, the tutorial provided detailed instructions on environment setup and, with just a few lines of code, helped users quickly experience the document translation workflow, ending with a sample translation result."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47ea8ab6",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
